SASSC: a standard Arabic single speaker corpus

نویسندگان

  • Ibrahim Almosallam
  • Atheer Alkhalifa
  • Mansour Al-Ghamdi
  • Mohamed I. Alkanhal
  • Ashraf Alkhairy
چکیده

This paper describes the process of collecting and recording a large scale Arabic single speaker speech corpus. The collection and recording of the corpus was supervised by professional linguists and was recorded by a professional speaker in a soundproof studio using specialized equipments and stored in high quality formats. The pitch of the speaker (EGG) was also recorded and synchronized with the speech signal. Careful attempts were taken to insure the quality and diversity of the read text to insure maximum presence and combinations of words and phonemes. The corpus consists of 51 thousand words that required 7 hours of recording, and it is freely available for academic and research purposes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language Variation as a Context for Information Retrieval

Speakers of widespread languages may encounter problems in information retrieval and document understanding when they access documents in the same language from another country. The work described here focuses on the development of resources to support improved document retrieval and understanding by users of Modern Standard Arabic (MSA). The lexicon of an Egyptian Arabic speaker and the lexico...

متن کامل

Crowdsource a little to label a lot: labeling a speech corpus of dialectal Arabic

Arabic is a language with great dialectal variety, with Modern Standard Arabic (MSA) being the only standardized dialect. Spoken Arabic is characterized by frequent code-switching between MSA and Dialectal Arabic (DA). DA varieties are typically differentiated by region, but despite their wide-spread usage, they are under-resourced and lack viable corpora and tools necessary for speech recognit...

متن کامل

Testing a large corpus of natural standard Arabic for rhythm class

Previous studies using acoustic correlates to measure speech rhythm have used small samples of audio and a limited number of speakers. Few have included standard Arabic in the analysis. This study uses Arabic news broadcast along with data output from an automatic speech recognizer timealigned transcript to test over 50 minutes of speech by 46 speakers. The results show that Arabic, like Englis...

متن کامل

روشی جدید جهت استخراج موجودیت‌های اسمی در عربی کلاسیک

In Natural Language Processing (NLP) studies, developing resources and tools makes a contribution to extension and effectiveness of researches in each language. In recent years, Arabic Named Entity Recognition (ANER) has been considered by NLP researchers due to a significant impact on improving other NLP tasks such as Machine translation, Information retrieval, question answering, query result...

متن کامل

Constrained Cepstral Speaker Recognition Using Matched UBM and JFA Training

We study constrained speaker recognition systems, or systems that model standard cepstral features that fall within particular types of speech regions. A question in modeling such systems is whether to constrain universal background model (UBM) training, joint factor analysis (JFA), or both. We explore this question, as well as how to optimize UBM model size, using a corpus of Arabic male speak...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013